Chinese Novelty Mining
نویسندگان
چکیده
Automated mining of novel documents or sentences from chronologically ordered documents or sentences is an open challenge in text mining. In this paper, we describe the preprocessing techniques for detecting novel Chinese text and discuss the influence of different Part of Speech (POS) filtering rules on the detection performance. Experimental results on APWSJ and TREC 2004 Novelty Track data show that the Chinese novelty mining performance is quite different when choosing two dissimilar POS filtering rules. Thus, the selection of words to represent Chinese text is of vital importance to the success of the Chinese novelty mining. Moreover, we compare the Chinese novelty mining performance with that of English and investigate the impact of preprocessing steps on detecting novel Chinese text, which will be very helpful for developing a Chinese novelty mining system.
منابع مشابه
Adaptable Services for Novelty Mining
Novelty mining is the process of mining relevant information on a given topic. However, designing adaptable services for real-world novelty mining faces several challenges like real-time processing of incoming documents, computational efficiency, multi-user working environment, diverse system requirements, and integration of domain knowledge from different users. In this paper, the authors brid...
متن کاملMobile Novelty Mining
Service-oriented Web applications allow users to exploit applications over networks and access them from a remote system at the client side, including mobile phones. Individual services are built separately with comprehensive functionalities. In this article, the authors transform a standalone offline novelty mining application into a service-oriented application and allow users to access it ov...
متن کاملBlended metrics for novel sentence mining
With the abundance of raw text documents available on the internet, many articles contain redundant information. Novel sentence mining can discover novel, yet relevant, sentences given a specific topic defined by a user. In real-time novelty mining, an important issue is to how to select a suitable novelty metric that quantitatively measures the novelty of a particular sentence. To utilize the ...
متن کاملNovelty detection: a review - part 1: statistical approaches
Novelty detection is the identification of new or unknown data or signal that a machine learning system is not aware of during training. Novelty detection is one of the fundamental requirements of a good classification or identification system since sometimes the test data contains information about objects that were not known at the time of training the model. In this paper we provide stateof-...
متن کاملRejecting the arguments of the sanctity of bitcoin mining and proving its legitimacy by Reward Contract (Joaleh)
Bitcoin soon attracted the attention of experts and the general public around the world, including the Islamic community. Due to the novelty of the subject, although little research has been done to examine the legitimacy of bitcoin mining from the perspective of Muslim thinkers, this paper is responsible for examining two reasons in the research of contemporary Sunni thinkers. The two reasons ...
متن کامل